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Abstract 



The gene networks that comprise the circadian clock modulate biological function across a range 
of scales, from gene expression to performance and adaptive behaviour. The clock functions by 
generating endogenous rhythms that can be entrained to the external 24-hour day/night cycle, 
enabling organisms to optimally time biochemical processes relative to dawn and dusk. In recent 
years, computational models based on differential equations have become useful tools for dissecting 
and quantifying the complex regulatory relationships underlying the clock's oscillatory dynamics. 
However, optimising the large parameter sets characteristic of these models places intense demands 
on both computational and experimental resources, limiting the scope of in silico studies. 

Here, we develop an approach based on Boolean logic that dramatically reduces the parametri- 
sation, making the state and parameter spaces finite and tractable. We introduce efficient methods 
for fitting Boolean models to molecular data, successfully demonstrating their application to syn- 
thetic time courses generated by a number of established clock models, as well as experimental 
expression levels measured using luciferase imaging. Our results indicate that despite their relative 
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simplicity, logic models can: (i) simulate circadian oscillations with the correct, experimentally- 
observed phase relationships amongst genes; and (ii) flexibly entrain to light stimuli, reproducing 
the complex responses to variations in daylength generated by more detailed differential equation 
formulations. Our work also demonstrates that logic models have sufficient predictive power to 
identify optimal regulatory structures from experimental data. 

By presenting the first Boolean models of circadian circuits together with general techniques 
for their optimisation, we hope to establish a new framework for the systematic modelling of more 
complex clocks, as well as other circuits with different qualitative dynamics. In particular, we 
anticipate that the ability of logic models to provide a computationally efficient representation of 
system behaviour could greatly facilitate the reverse-engineering of large-scale biochemical networks. 



1 Introduction 



Circadian rhythms are the fundamental daily oscillations in metabolism, physiology and behaviour 
that occur in almost all organisms, ranging from cyanobacteria to humans (l). The gene regulatory 
networks (GRNs), or clocks, that generate these rhythms regulate the expression of associated genes 
in roughly 24-hour cycles. Circadian networks have been studied in a variety of experimentally 
tractable model systems, revealing that different organisms share structurally similar circuits based 
around interlocking sets of positive and negative gene-protein feedback loops ■ In mammals, 
circadian rhythms are being increasingly recognised as important to healthy phenotypes, playing a 
role in ageing Q , cancer Q , vascular disease Q and psychiatric disorders Q , as well as modulating 
innate immunity [9- 

Clocks synchronise to their environment by using light and temperature to regulate the levels 
of one or more components of the feedback loops. This ensures that key biological processes 
are optimised relative to dawn and dusk, benefiting growth and survival |l 2L [3| • F° r the clock to 
provide such an adaptive advantage, the phase must change appropriately when the clock is subject 
to regular perturbations - particularly seasonal changes in daylength (the photoperiod) . However, as 
well as exhibiting flexible responses to variations in the input light signal, the clock must also exhibit 
robustness to irregular perturbations, such as genetic mutations and the intrinsically stochastic 
environment of the cell. 

Temperature also plays a critical role as an environmental time cue. Across different species, the 
clock is relatively insensitive to temperature in that the period of free-running oscillations typically 
has a Qiq value close to I 14-1^. This latter phenomenon, known as temperature compensation, 



is generally considered to be one of the defining properties of the circadian clock and has been 
suggested to be a key requirement for stability of the clock's phase relationship under seasonal 
temperature variations [fT], [3] . 

The ability of ordinary and delay differential equations (DEs) to reproduce the underlying 
continuous dynamics of biochemical networks, and to parametrise individual reactions, has led to 
the construction of DE models in a number of circadian organisms. These include the fungus 



and the higher plant Arabidopsis thaliana [34M38I ] Such models have proven useful in 



Neurospora crassa 19H25J, the fly Drosophila melanogaster 



the mammal Mus musculus 



uncovering the general design principles of circadian oscillators, as well as providing a quantitative 
framework within which to interpret experimental results [4, 38 1. In particular, novel insights have 



been gained into the mechanisms promoting robustness with respect to photoperiod changes [25| . 
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temperature fluctuations 0,0] and molecular noise 39 - 41 1 . The DE models have also yielded 
experimentally-testable predictions that have lead to the discovery of novel circadian regulators 



36|. 

However, a significant drawback of the DE approach is that the values of the kinetic parameters 
controlling each individual reaction have to be specified, and for clocks these are typically unknown. 
When constructing a DE model, it is therefore necessary to calculate the particular combination of 



parameter values giving an optimal fit to experimental data [18l |25| . |34| - |37| . For realistic systems 
involving large numbers of reactions, this optimisation procedure is computationally very expensive 
making exhaustive parameter searches intractable. With increasing parameter numbers also comes 
a need for data with which to constrain the optimisation, placing a greater demand on experiment in 
terms of finance, time and ethics. These concerns mean that there is a pressing need for modelling 
approaches that minimise the number of parameters required, whilst adequately capturing the 
essential dynamical behaviour of the system of interest. 

Here we develop just such an approach, based on Boolean logic. In Boolean models, the activity 
of each gene is described with a two-state variable taking the value ON (1) or OFF (0), meaning 
that its products are present or absent respectively. Biochemical interactions are represented by 
simple, binary functions which calculate the state of a gene from the activation state of its upstream 
components 42H50] . This approximation dramatically reduces the state-space of the system, map- 



ping the infinite number of different continuous system states in a DE model to a finite number of 
discrete states in the Boolean equivalent. 

An additional important advantage of using a logic approach is that the total number of pa- 
rameters is substantially reduced. For a given gene, the full set of reactions determining its state 
through a particular interaction is parametrised by a single signalling delay, representing the net 
time taken for these reactions to cause a change in state. Fitting to experimental data introduces 
an associated discretisation threshold, with expression levels above the threshold taken to corre- 



spond to the ON state of the gene and levels below it to the OFF state [46l-l48l. |50( . In fitting a 
particular experimental data set, each delay becomes a multiple of the sampling interval, while only 
a bounded subset of thresholds will yield distinct Boolean expression patterns. This means that the 
total number of parameter combinations is finite and can be enumerated. Thus, by building a logic 
version of a DE model, an infinite model can be converted into a finite one with fewer parameters 
to be optimised. This extends the scale and complexity of GRNs that can be studied by Boolean 
models far beyond the practical scope of DEs. 

In this work, we introduce the first Boolean models of circadian networks. By constructing 
logic analogs of a number of established DE clock models, we demonstrate that in each case the 
Boolean models are capable of accurately reproducing the higher-order properties - particularly 
photoperiod responses - of their DE counterparts. This suggests that the complex, biological 
signal transduction simulated by the DE models can be captured in Boolean equivalents possessing 
significantly smaller parameter sets. We introduce a general method for optimising Boolean models 
that avoids the qualitative and often subjective terms characteristic of the cost functions used to 
fit the parameters of large DE clock models. Furthermore, we show that our fitting algorithm is 
capable of determining the optimal Boolean model configuration associated with a given circuit 
topology and experimental data set. In particular, our algorithm successfully predicts the recently 
discovered repressive action of the circadian gene TO CI on LHY in the central feedback loop of 
the Arabidopsis clock [51] . 
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Taken together, our results show that Boolean models can quantitatively distinguish between 
a range of putative regulatory structures on the basis of the system dynamics. This identifies 
Boolean logic as a viable technique for reverse engineering circadian networks, complementing 
approaches based on DEs. Moreover, our work also suggests novel hybrid modelling approaches 
based on employing Boolean models as a first step towards the construction of more detailed 
DE formulations. More generally, we propose that our methodology provides an efficient way 
of systematically modelling complex signalling pathways, including other oscillatory circuits and 
systems characterised by steady-state dynamics. 



2 Results 

2.1 Logic models employ significantly fewer parameters 

We selected 4 recent circadian oscillator models of increasing complexity with which to assess the 
suitability of a Boolean formulation. The simplest of these was a Neurospora model based on a 



single negative feedback loop with a single light input [19| (Figure [T]^) . This is represented by 
3 differential equations (DEs) parametrised by 13 kinetic constants. The second model was a 
modified version of the 1-loop Neurospora circuit in which there are a pair of negative feedback 
loops associated with different isoforms of the active protein [3| (Fig. HJ3). The extra feedback loop 
results in 5 DEs parametrised by 18 kinetic constants. The third model was an Arabidopsis circuit 



based on a pair of interlocking feedback loops with 3 light inputs 35| (Fig. HP). It is described by 
13 DEs together with 64 kinetic constants. The final model considered was a 3-loop Arabidopsis 
circuit obtained by adding an extra feedback loop and light input to the 2-loop system [3^ (Fig. 
HP). This yields 16 DEs parametrised by 80 kinetic constants. 

In a Boolean model, an interaction j between two components Xi and is quantified by the 
corresponding signalling delay tj. This is the time taken for the biochemical processes represented 
by j to convert a change in the state of Xi into a change in the state of Xk (see Fig. SI of the 
Electronic Supplementary Material). The signalling delays are thus parameters that determine 
the dynamics of the model, with different combinations of delays yielding different attractor states 



(e.g. steady-states or limit cycles) [43J, |44j, [52|, [53j. In order to relate the discrete dynamics to 
the continuous variations in expression level observed experimentally, it is necessary to introduce a 
discretisation process. Supplementary Fig. S2A shows a hypothetical time course and Supp. Fig. 
S2B the result of applying a particular discretisation threshold. In the general case, the choice of 
threshold is dependent on the effective range of activity and expression that is critical for signal 
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propagation. As a result, each threshold also becomes a parameter of the logic model 

For logic descriptions of the circadian networks shown in Fig. [2J each edge is parametrised by 
a signalling delay Tj and each vertex by a threshold T^. Thus, a network is characterised by the 
vectors r = (Tj) and T = (Tj). In Fig. [3l we compare the total number of parameters in the logic 
and DE versions of each network (see also Supplementary Table SI). We can see that dramatically 
fewer parameters are required in a logic description compared to the corresponding DE formulation. 
Indeed for the largest network considered, 3-loop Arabidopsis, the Boolean description reduces the 
number of parameters by a factor of 4. 
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2.2 Logic model configurations consistent with DE models are optimal 

In identifying the logic models that best reproduce the corresponding DE dynamics, we considered 
variations to the structures of the networks shown in Fig. [T] Starting from the abstract topologies 
of Fig. [3J each edge is activating or inhibiting, and where two or more edges lead into a vertex, the 
corresponding inhibition/activation signals are combined to determine the state of the gene. 

In the Boolean formalism, the manner in which regulatory signals affect a gene's expression level 
is represented by the corresponding logic gate. This is a binary function that specifies the current 



state of the gene, ON (1) or OFF (0), for each possible combination of input states [46|, [54[. For 
genes with a single input, there are two possible functions for which the output varies with the 
input: the identity gate, in which the output follows the input (0 — > and 1 — * 1), and the NOT 
gate, in which the input is inverted (0 — » 1 and 1 — * 0). These represent activation and repression of 
the gene by its regulator respectively. Genes with two inputs are commonly modelled using either 



an AND or an OR gate [46j, |49|, |54| . For simplicity, we considered each multi-input gate to be a 
composition of ANDs and ORs (see Methods for further details). The particular combination of 
logic gates used to model a network is referred to here as its logic configuration (LC): this encodes 
the regulatory structure of the network in a compact fashion. 

It follows that each abstract topology of Fig. [2] gives rise to 2 E+M possible LCs, where E is the 
number of edges and M is the number of vertices with more than one input. The total number of 
possible LCs is 4 for 1-loop Neurospora, 32 for 2-loop Neurospora, 256 for 2-loop Arabidopsis and 
2048 for 3-loop Arabidopsis. In each case, a subset of these LCs are consistent with the pattern of 
activation and inhibition in the corresponding DE model. That there can be more than one such 
LC in each case is due to the choice of AND or OR where a vertex has multiple edges leading into 
it. For Neurospora, the 1-loop circuit has a unique LC consistent with the DE model while for the 
2-loop circuit there are two DE LCs. The 2- and 3-loop Arabidopsis networks yield 4 and 8 DE 
LCs respectively. 

For a given logic model, we would expect the DE LCs to most closely reproduce the DE dynam- 
ics. To test this hypothesis, we optimised LCs to synthetic experimental data obtained from the 
DE system, and then compared their predictive performance. For each LC, the match to the con- 
tinuous dynamics was quantified by finding the combinations of parameters (signalling delays and 
discretisation thresholds) that minimised a quantitative cost function. In order to be able to objec- 
tively compare the ability of the Boolean models to reproduce experimental time courses against 
that of their DE counterparts, we employed a cost function that closely mirrored those commonly 



used to optimise continuous models 18, 25, 34-36, 38|. The cost function we used measured the 



goodness-of-fit of each logic model to synthetic data generated in both 24h light-dark (LD) cycles 
and the appropriate free-running light regime (continuous dark, DD, for Neurospora; continuous 
light, LL, for Arabidopsis) . At each vertex, the cost score was calculated as the correlation between 
the discretised time series for the downstream species and the time-delayed predicted output cal- 
culated from the discretised data for the upstream species. Scores were summed across all vertices 
and light regimes to give the final cost value (see Methods for details). 

After ranking the optimised LCs by score, we then assessed whether the top ranking LCs 
comprised viable clock circuits by checking that they were capable of: (i) generating self-sustained 
oscillations with a circadian period in constant conditions; and (ii) entraining to LD cycles over a 
realistic range of photoperiods [55I ]. 

For the Neurospora and 2-loop Arabidopsis logic models, the full set of LCs were fitted to 
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synthetic data. The best-performing LCs for these networks are shown in Figs. |4]and[5]A_. It can be 
seen that the optimisation method produces a clear separation of the LCs by score. Moreover, for 
each network, one of the DE LCs is uniquely identified as the optimal circuit yielding a viable clock. 
In the case of 3-loop Arabidopsis, we considered the subset of configurations obtained by setting 
the gates common with the 2-loop Arabidopsis circuit to their optimised values. This mirrored 
the construction of the 3-loop DE model which was derived from the 2-loop system by adding an 
additional feedback loop while fixing all other interactions, and then optimising the parameters 
of the new loop ^3^ . Constraining the structure of the circuit in this fashion yielded 8 possible 
LCs (the 2 edges in the LHY-PRR loop described as activation or inhibition and the AND/OR 
interaction at LHY; see Fig. HP). Figure [3)3 shows that in this system, as for the other models, a 
DE LC emerges as the optimal circuit. 

2.3 Optimal Boolean models have biological time series characteristics 

The time series generated by the optimal configurations in LD cycles are shown in Fig. [S] The 
corresponding DE simulations are also plotted for comparison. 

In each case it is clear that the Boolean models capture the same qualitative dynamics as their 
DE counterparts. Different species are switched on and off relative to one another with phases that 
match the patterns of rising and falling expression in the corresponding continuous time series. 
Moreover, the delays between the switching times are similar to the phase differences between the 
peaks and troughs of the DE solutions. 

It should be noted that both Boolean Arabidopsis models reproduce the acute light response in 
the Y gene, as well as the circadian response in Y around dusk (cf. Figs. [Sp-H). This demonstrates 
the ability of the Boolean circuits to simulate biochemical processes that occur on different time 
scales within the same system. 

The optimal LCs give equally good matches to the DE dynamics in simulated free-running 
conditions, as can be seen in Supp. Fig. S3. 



2.4 Optimal Boolean models have biological photoperiodic behaviour 

In order to assess the extent to which the Boolean models reproduce the DE dynamics in a more 
global and quantitative fashion we compared phase changes over a range of biologically realistic 
photopcriods. In the Boolean framework, the times at which solutions switch between and 1 
emerge as natural phase measures. For continuous time courses - such as those generated by DE 
models - the point in the circadian cycle at which the expression level of a molecular species decreases 



below a set threshold can be employed as a phase marker [25|, [56|, [57[ . This suggested using the time 
at which each species decreases below its discretisation threshold as a phase measure for the DE 
simulations, and the time of the 1 — * transition as the equivalent marker in the Boolean models. 

The phasc-photoperiod relationships computed in this fashion are shown for the Neurospora 
models in Figs. [3V and B. For both networks, the photoperiodic behaviour of the Boolean and 
DE models is very close: indeed for the 1-loop network, they are exactly equivalent. Figs. [7p 
and D plot the photoperiod simulations obtained with the Arabidopsis circuits. Here too, the 
phase-photopcriod profiles are very similar, with the addition of the LHY-PRR loop to the 2-loop 
model causing a transition from a predominately dusk- locked system to a dawn-locked one [57[ . In 
particular, the Boolean 2-loop Arabidopsis circuit exactly reproduces the dual light response in the 
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Y gene, in which the acute peak tracks dawn and the circadian peak tracks dusk. This suggests 
that the logic circuits possess sufficient dynamic flexibility to perform the complex integration of 
environmental signals that is a central property of circadian systems. 



2.5 Boolean models can determine circadian network structure from ex- 
perimental data 

The success of the logic models in recovering the correct DE configurations from synthetic data 
suggested that for a fixed abstract topology, our optimisation procedure has the capacity to deter- 
mine the logic network most consistent with a given data set. We tested this finding further by 
optimising the 3-loop Arabidopsis logic circuit to highly-sampled experimental time series recorded 
using luciferase (LUC) imaging in constant light from a wild-type strain All possible LCs 

were considered, corresponding to a network inference carried out assuming no prior biological 
knowledge. The cost function optimised was the same as that used for fitting to synthetic LL data. 
As previously, viable clock circuits were taken to be those yielding autonomous limit cycles with 
circadian periods. 

The results of fitting to experimental data are presented in Fig. (5]A_. It can be seen that the 
second highest ranking LC giving a viable circuit is a DE configuration. This configuration, Gde, 
is in fact the same as that previously determined to be optimal from the synthetic Arabidopsis 
data sets. Moreover, Fig. [HJ3 shows that Gde emerges as the top ranking clock configuration if 
the regulatory structure is constrained to incorporate the central LHY-TOC1 negative feedback 
loop of the corresponding continuous model. Interestingly, the optimal configuration, Gopt, differs 
from Gde in the sign of the TOC1-LHY interaction, with TOC1 repressing LHY in Gopt as 
opposed to activating it (compare Supp. Figs. S4A and B). This result is consistent with the recent 
experimental characterisation of the LHY-TOC1 circuit as a double negative feedback loop, rather 
than the single negative loop assumed in the 2- and 3-loop DE models 
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Figures |9j3 and C show the simulations of the free- running clock obtained from the optimal and 
DE configurations respectively. For both LCs, the simulated oscillation period and timing of gene 
expression is close to that of the natural system (cf. Fig. This can be seen clearly by comparing 
the durations for which each gene is ON in the two Boolean models with the corresponding peaks 
in the continuous data. 



3 Discussion 

3.1 A new approach to quantitative circadian modelling based on Boolean 
logic 

Circadian clocks have become popular systems for studying the relationship between gene-protein 
dynamics and phenotypc. The molecular machinery underlying these networks has been relatively 
well characterised, and this has led to great interest in developing predictive computational models 
of the clock. Such models are usually formulated as sets of differential equations. The high level 
of biochemical detail afforded by this approach has allowed DE models to successfully address 
a range of issues regarding the functional relationship between the architecture of the clock and 
circadian homeostasis. One homeostatic mechanism that has been comprehensively investigated 
from a theoretical perspective is temperature compensation. By assuming that certain subsets of 
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a circuit's kinetic parameters are temperature-dependent, temperature has been incorporated into 
a number of circadian DE models, ranging from minimal circuits built on the Goodwin oscillator 
58j to more detailed formulations that explicitly model the underlying biochemical reactions 
These models have provided new insights into the possible mechanisms that lead to 
circadian mutants with altered compensation properties, as well as suggesting generic motifs that 
may facilitate the tuning of the phase-temperature relationship. 

However, clock models based on DEs are characterised by large numbers of kinetic parameters, 
the values of which are typically unknown. Optimising these to experimental data is a major 
computational bottleneck which imposes a hard bound on the maximum system size that can be 
studied. In addition, as the fitting problem is typically highly undcrdctermincd and involves noisy, 
undersamplcd experimental data, robust optimisation can require the construction of cost functions 
targeting specific qualitative features of the data, introducing a degree of arbitrariness to the fitting 



25, 



34 
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procedure [lS, 

The need for minimal clock parametrisations to address these issues has been recognised else- 
where in the literature. For example, recent Neurospora work has shown that it is possible to use a 
simple two-parameter function to represent all the intermediate processes between the expression 
of a clock gene and its action on a downstream target, whilst still maintaining sufficient flexibility 
to accurately simulate biological temperature and photoperiod responses [3, [25I . 

An alternative technique for modelling gene regulatory networks is provided by Boolean logic. 
By assuming discrete expression levels, Boolean models provide an even greater reduction in com- 
plexity, albeit at the cost of reduced biochemical precision. Previous studies have exploited this 
reduction to study GRN state-space structures 48|, [5S L 6C L and to introduce probabilistic models 



that parametrise statistical transitions between states 45 , 6l|] . The different limit cycles enumerable 
by a single Boolean GRN have been interpreted in some models as examples of cell differentiation 
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63J, making limit cycle properties of special interest. Elsewhere, Boolean studies have focused 

the immune response 



on the regulatory logic underlying transcription 
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531 ] and apoptosis 



49|. 



Here, we have presented the first models of biological clocks based on the Boolean formalism. We 
constructed logic versions of 4 DE models simulating circadian rhythms in the organisms Neurospora 
crassa and Arabidopsis thaliana. To derive the best logic representations, both regulatory structure 
(logic configuration) and parameters (signalling delays and discretisation thresholds) were optimised 
to synthetic experimental data generated from the DEs. In each case, we found that the logic 
configuration yielding the best fit to data was consistent with that of the corresponding DE system. 
This suggested that the Boolean models were capable of identifying the particular combination 
of activating and inhibiting elements best able to account for a set of experimental expression 
patterns (Figs. 0]and[5]). We confirmed this hypothesis by successfully applying our optimisation 
algorithm to real Arabidopsis LUC data (Fig. [5]). These results demonstrate that the logic models 
possess sufficient predictive power to perform biological network inference, despite their relative 
simplicity. Moreover, the optimisation to LUC data highlights the importance of network structure 
in determining dynamic behaviour 



G7, 
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This optimisation gave rise to a pair of putative 3-loop 
network architectures with distinct parameter values that generated very similar expression time 
series (Supp. Fig. S4 and Fig. [9]). One of these architectures matched the 3-loop Arabidopsis DE 
model. It therefore contains the DE model's core LHY-TOC1 negative feedback loop, which was 
based on an experimental study that demonstrated the repression of TOC1 by LHY whilst also 
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inferring the activation of LHY by TOC1 69]. Significantly, the other architecture - which gave 
the best fit to data - agreed with more recent biochemical work showing that TOC1 is in fact a 
repressor of LHY [5l|. This result is also consistent with a complementary computational study 
showing that an expanded version of the 3-loop DE model incorporating a negative TOC1-LHY 
interaction yields more accurate simulations of TOC1 knockout and overexpression mutants [70I ] . 

We also found that although the cost function used to fit synthetic data only assessed goodness- 
of-fit in simulated 12:12 LD cycles, the logic models gave very good fits to the DEs in long and short 
days also, closely reproducing the relationships between photoperiod and expression phase (Fig. [7]). 
The coordination of biochemical activity with the timing of dawn and dusk is a key system-level 
property that can be used to assess the relationship between the structure of a circadian network 
and its evolutionary flexibility 57 1. Our photoperiod simulations indicate, somewhat surprisingly, 
that this property can be accounted for by significantly reduced dynamic models possessing much 
smaller parameter sets. In particular, the combined dawn- and dusk-tracking observed in the 2- 
loop Arabidopsis DE model (Fig. [7p) is accurately replicated by its logic formulation which has 
less than 1/4 the number of parameters. A similar reduction was observed for 3-loop Arabidopsis, 
with the 80 parameters describing the DE models decreasing to 20 in its Boolean equivalent (Supp. 
Table SI). The ability of simple, discrete models to simulate biologically realistic photoentraiment 
may also have important implications for the nature of circadian signal processing, in addition 
to being interesting from a modelling perspective. Specifically, it is consistent with hypotheses 
based on experimental data suggesting that the mechanism by which daylength and light intensity 
information is transmitted to output pathways may be partly digital in nature 0|. 

It should be noted that despite the success of the logic models in reproducing light responses, 
the coarser representation of network dynamics they provide means that the reproduction of certain 
circadian properties pose problems for the Boolean approach. A particular restriction of note relates 
to the modelling of temperature compensation. Temperature can be readily introduced into any 
DE model by following the methodology originally introduced by Ruoff [l^, 0j- ^ n ^iiis scheme, 
the temperature dependence of a kinetic parameter is assumed to be governed by the Arrhenius 
relation. This leads to a simple balance equation for the corresponding activation energies and 
control coefficients which, when satisfied, guarantees a near-zero period-temperature derivative at 
the balance point. In a Boolean model, the parameters that determine the free-running period tfr 
are the discrete delays Tj, and so simple balance equations can be derived from the form of the 
function determining the dependence of tfr on the TjS. For example, in the 2-loop Neurospora 
model, it can be shown that trr = T\ + T2 + T3 + T4. Thus, compensation can be achieved simply 
by choosing the temperature dependence of each delay Tj so that their sum is constant at each 
point over the range of interest. However, as the TjS are generic parameters which in each instance 
summarise several biochemical processes, any balance equation derived in this manner will not in 
general be grounded in physical chemistry, reducing its biological relevance. 



3.2 Computational advantages of the Boolean formulation 

In addition to providing a more compact parametrisation, a significant strength of the logic mod- 
elling approach is that it greatly reduces the complexity of the optimisation procedure itself. This 
enables optimal configurations and delay-threshold combinations to be distinguished with a greater 
objectivity. The critical simplification afforded by the Boolean formulation is that the cost score is 
computed from bitstrings, which take the value or 1, rather than data that varies over a much 
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broader range of values. This removes the need for ad hoc cost function terms for normalising the 
data and robustly computing period and phase. Indeed, the cost function we employed in this 
work was extremely simple, computing the correlation between the predictions of the model and 
the corresponding discretised experimental data at each vertex. 

A second important simplification relates to the structure of the parameter space. In DE 
models, each interaction delay is dependent on processes that occur over a range of time scales 
(e.g. transcription, translation, degradation etc). This means that as well as being uncountably 
infinite, the parameter space can cover several different orders of magnitude, making it difficult 
to establish a priori bounds. By contrast, each signalling delay in a logic model is constrained to 
be a multiple of the data sampling interval t$, and is bounded above by the maximum possible 
time tuAX over which the corresponding interaction can occur. tyiAX is itself bounded by the 
duration of the experimental measurements and can be further restricted on the basis of biological 
knowledge (here, the free-running period was used to bound all delays - see the Methods section 
for details). The set of possible signalling delays is thus finite. The set of possible discretisation 
thresholds is also finite as varying the threshold for a given species will generate a set number of 
different binary time series, meaning it is only necessary to consider thresholds for which distinct 
bitstrings are obtained. 

For logic models, the parameter space as a whole is therefore finite, and can be objectively 
bounded. Furthermore, for a fixed abstract topology, the set of all possible logic configurations 
is also finite; it is simply equal to the set of possible Boolean functions. This means that it 
is, in principle, possible to comprehensively search across all possible regulatory structures and 
parameter combinations to determine the best fit to data. Such a search is impossible with DE 
systems. Furthermore, searching across different patterns of activation and inhibition for a DE 
model is often problematic as it requires some a priori assumptions to be made regarding the 
underlying biochemical mechanisms; e.g. specification of which reactions may be cooperative and 



the ranges of the corresponding Hill coefficients [18|, [25, 34 36] . In practice, however, the number 



of possible network and parameter combinations in a Boolean formulation can become too large for 
a complete search with the computational resources available, making it necessary to constrain the 
optimisation. 

3.3 Refining the optimisation protocol 

Here, in order to ensure computational tractability, we subsampled the delay-threshold space, while 
also fixing the parameters controlling the impulse light inputs. In addition, we restricted ourselves 
to a subset of logic circuits by assuming that: (i) multiple light inputs to a gene are combined with 
an OR gate; and (ii) the net light signal directly modulates the state of the gene through either 
an AND or an OR gate, depending on the free-running light regime (see Fig. [5]). Of course not 
all possible circuits will be biologically reasonable ones (e.g. any circuit for which gates with light 
inputs uniformly output or 1 would be unviable). Nonetheless, it is reasonable to expect that 
some interactions may be more accurately modelled by gates that are not of the simple AND / OR 



type 64|, meaning that better performing logic configurations may have been overlooked. 

In view of these restrictions, the fact that our optimal Boolean circuits match the dynamics of 
their target data sets, both synthetic and experimental, is very encouraging. Indeed, we anticipate 
that it should be possible to find circuit configurations and parameter sets giving good fits to 
data over a broader range of genetic and environmental perturbations. For example, our Boolean 
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3-loop Arabidopsis model shows consistency with much of the photoperiodic behaviour of its DE 
counterpart (Fig. 03). However, it does not reproduce the phase response observed in the DE model 
as photopcriod is decreased, for which some components switch between dawn- and dusk-dominance 
in a complex manner 57 j. 



A likely contributing factor is that our current optimisation method involves computing the 
score at every parameter combination over a fixed lattice. This can be inefficient, particularly 
where the stoichiometry of an interaction and/or its molecular dynamics require the density of 
interacting species to accrue beyond a certain value before the interaction is statistically likely to 
occur. In such cases, the optimal threshold choice for the discretisation is likely be found within 
a narrow band of possibilities. Furthermore, high threshold resolutions can be required to resolve 
topological degeneracies in the cost function which make optimal networks difficult to distinguish 
(see section SI. 2 of the Electronic Supplementary Material). The accuracy of the optimisation could 
thus be increased by employing logic variants of global search methods capable of providing a more 



computationally efficient exploration of the parameter space, such as evolutionary algorithms [74 1. 
This would increase the predictive power of the models, better positioning us to address features 
such as the dawn and dusk switching in shorter photopcriods observed in the 3-loop model. 

3.4 Future directions 

Finally, we note that there are many promising avenues for further developing the approaches 
introduced in this study. From a theoretical perspective, there is scope for extending established 
methods for analysing Boolean models. Of particular interest arc techniques for determining the 
simplest inequalities involving linear combinations of the time delays that are required to traverse 



a given path through the state space [44j, [52J, 53|. In the context of circadian models, this would 



involve developing general methods for deriving linear inequalities that result in free-running cycles 
with the target period. As the computational demands of optimisation grow exponentially with the 
number of parameters to be fitted, restricting the search to the corresponding solution set would 
be expected to dramatically reduce the computational load. 

More generally, hybrid logic/DE algorithms would sec Boolean methods used firstly to determine 
the optimal model configuration from data, and then to identify the regions of parameter space 
within which an equivalent DE model is likely to give a good fit. This would exploit the ability of 
logic models, demonstrated in this work, to generate a coarse, but quantitative representation of 
the system dynamics in a systematic and efficient manner. We anticipate that regulatory networks 
of much greater complexity than those considered here could be quantitively modelled using such 
an approach. 



4 Materials and methods 

4.1 Data sets used for model fitting 

In order to incorporate some of the variability in expression levels characteristic of real experimental 
data, synthetic data was generated using the variant of Gillespie's stochastic simulation algorithm 
(SSA) introduced by Gonze et al. ^3^ . Fits to deterministic data obtained from direct integration 
of the DEs gave very similar results (data not shown). However, the stochastic data sets yielded a 
clearer separation between LCs, most probably due to the lifting of topological degeneracies. For all 
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circuits, final analyses were therefore restricted to the results obtained with stochastic time courses. 

In generating the synthetic time courses, the scaling (or extensitivity) parameter Q was set 
close to the minimum value yielding self-sustained, unforced oscillations in each case. The final 
values used for simulations were: 1-loop Neurospora, Q = 25; 2-loop Neurospora, Q = 50; 2- and 
3-loop Arabidopsis, fl = 1000. Time series were generated for 5 cycles in both entrained (LD 12:12) 
and free-running conditions (DD for the Neurospora models; LL for Arabidopsis) . This choice 
mirrors the combination of experimenta l data sets typically chosen for parameter optimisation in 



computational circadian studies [18L l25l I34l - l36l . l38l| . Time series were then subsampled every 0.5h 
to give the data used for model-fitting. Plots of the synthetic LD data sets are shown in Supp. Fig. 
S5. 

Experimental gene expression levels were measured using lucifcrase (LUC) imaging, carried out 



as previously described [16(. Images were recorded using Hamamatsu C4742-98 digital cameras 
operating at -75°C under the control of Wasabi software (Hamamatsu Photonics, Hamamatsu City, 
Japan) with a sampling interval of 1.5h. Bioluminescence levels were quantified using Metamorph 
software (MDS, Toronto, Canada). The resulting expression profiles were detrended for amplitude 
and baseline damping using the mFourfit function of BRASSv3 . The detrended time series can 
be seen in Fig. [SR. 

4.2 Implementing circadian networks in Boolean logic 

For ?? biochemical species, let X, (t) e {0, 1} denote the activity of species i at time t, where t is a 
multiple, kts, of the sampling interval t$- The update rule is then expressed as: 

Xi (t) = Si (Xi (t- TjJ , ...,X n (t -r in ) ,Li (t - t n+ i) ,...,L m (t- r N+m )) . (1) 

Here the LkS represent external light inputs to the circuit and t\, . . . , T^+m are the signalling delays, 
with tu £ {ti,...,t/v} for all Assuming a 24h day and writing trjAWN and touSK for the 
times of dawn and dusk respectively, the LfcS are described by the following function: 

L ^ _ | 1 if to awn < mod (t, 24) «S mod {t DAWN + Pkl 24), 
I otherwise. 

Setting pk = creates a constant darkness signal (DD); setting pk = 24 corresponds to constant 
light (LL); setting pk = tuuSK ~ to awn yields a continuous light-dark (LD) cycle. Parameter sets 
with pk < tousK — trjAWN yield LD cycles with a pulse of length pk at dawn. 

The functions Sj are Boolean functions Sj : {0,l}" +m — » {0,1} representing the interactions 
between genes and light inputs that determine the state of species X,-. We introduce a logical 
dependency function (or logic gate) G(-,g) to describe these interactions. This function takes a 
single parameter g e {0, 1}, the value of which determines the type of reaction modelled. Formally, 
we consider two types of operator. The first operator acts on a single Boolean input Y e {0,1}, 
implementing either the identity or NOT gate, modelling activation and repression respectively: 

G 1 (Y,0) = Y 

Gi (y, 1) = NOT Y. 

The second type of operator acts on two Boolean inputs Y, Z e {0,1}, implementing cither the AND 
or the (inclusive) OR dependency. If either species can fulfil the interaction, an OR dependency is 
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used. If both species are required in the interaction, an AND dependency is used. Thus, for species 
Ye {0,1} and Ze {0,1}: 

G 2 (Y, Z, 0) = FOR Z 
G 2 (Y, Z, 1) = FAND Z. 

The functions s, in ([T]) are formed as compositions of these dependencies. For example, the 
update rule for gene TOC1 in the optimal 2-loop Arabidopsis model has the form 

XtOCI (*) = (NOT X LHY (t - n)) AND Xy (t - T 6 ) . (5) 

Using ^ and (J3) , this can be written as a composition of logic functions 

Xtoci (t) = G 2 (G x (X LHY (t - n) , 9l ) , Gi (X Y (t - r 6 ) , g 6 ) , g s ) , (6) 

with ge = and g± = gg = 1. The resulting Boolean function models a reaction in which LHY 
must be downrcgulated in order for Y to activate TOC1 production. 

A given set of interaction functions Si has an associated adjacency matrix A = (Aij) defined by 

_ I 1 if Si (Xi, . . . , Xj-i,0, Xj + i . . . , X n , L) ^ Si (Xx, . . . , Xj-x, 1, X,- + i . . . , A„, L) , 
1,7 ( otherwise, 

where L = (Li, . . . , L m ). Thus, A^ = 1 if species Xj can change the state of Xi, and so matrix A 
describes the abstract topology of a model (these topologies can be seen in Fig. [2}. It follows that a 
circadian clock model is parametrised by its adjacency matrix A, a set of delays r = (ti, . . . , Tpf +rn ) 
and a set of gates G = (gx, ■ ■ ■ ,gd)- Writing X = (Xi, . . . ,X n ), this yields the following compact, 
vectorised form of the update rule: 

X(t)=S(X(t),L(t);A,T,G). (8) 

We refer to the set of gates G as the model's logic configuration (LC). As each configuration G = 
(gx, ■ ■ ■ ,3d), is a bitstring, it can be represented uniquely by the corresponding decimal expansion 
D (G) defined below: 

D(g 1 ,...,g d ) = Y i ga d - 1 - (9) 
1=1 

For the models considered, the LCs are enumerated in terms of their decimal expansions for sim- 
plicity in Figs. HIE] and [S] 

Of the LCs consistent with A, a subset matches the pattern of activation and inhibition in 
the corresponding DE model. There are referred to as the DE LCs. For example, for the 2-loop 
Neurospora model, frq activates both isoforms of FRQ, giving gi = g 2 = 0, and both isoforms 
repress transcription, giving g$ = gx = 1 (see Fig. [TJ3 and Fig. HJ3). There are thus a pair of DE 
LCs in this case, 00110 and 00111, depending on whether both isoforms are required for repression 
(55 = 0, corresponding to OR) or a single isoform is sufficient (g$ = 1, corresponding to AND). 
These DE LCs are encoded by the integers 6 and 7 respectively. 

Finally, in order to construct the simplest logic models consistent with the general form of 
circadian DE systems, each Boolean function s,; is assumed to have the form 

Si (X, L) = Hi (sf (X) , sf (L)) , (10) 

such that: (i) H, : {0, l} 2 {0, 1} implements either the AND or OR gate; (ii) sf : {0, 1}" {0, 1} 
encodes the structure of the free-running system; and (iii) sf : {0,1}™ — > {0,1} determines how 
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multiple light inputs are integrated, with sf (0, ...,0) = and sf (1,...,1) = 1. For consis- 
tency, it is therefore necessary that setting each light input Lk to the relevant constant value L 
recovers the free-running system (L = for DD: L = 1 for LL). This condition is equivalent to 
Hi ysf (X) , i) = sf (X). For the Neurospora models, where the free-running condition is DD, the 
consistency condition is achieved using an OR gate since x OR = x. In the case of the Arabidopsis 
circuits, where the free-running condition is LL, the AND gate is appropriate as x AND 1 = x. 

For both Arabidopsis models, the multiple light inputs to Y are combined using an OR gate: this 
is because in the corresponding DE models, both inputs can independently upregulate transcription 



In addition, the 3-loop model only incorporates the pulsed light input to the PRR gene since 



removing the continuous light input from the DE system had a negligible effect on its photoperiodic 
behaviour. 

Full details of the logic formulation of each clock network are given in section S2 of the Electronic 
Supplementary Material. 



4.3 Optimisation and constraints 

In order to identify the optimal combination of signalling delays r = (Tj), thresholds T = (Ti) 
and logic gates G = (gi) for a given model and data set, we must introduce a cost function to be 
minimised. We used a simple function based on a correlation between the predicted time courses 
generated by the model and the corresponding data. Further details can be found in section Sl.l of 
the Electronic Supplementary Material. Optimal LC-parameter combinations are given in Tables 
1 and 2. 

The most comprehensive strategy in seeking to minimise the cost function is to systematically 
calculate the cost for all possible configurations and parametrisations of the model; that is all delays 
< Tj < t max j where tuAX is the longest timescale considered in the system, and all thresholds 
< Ti < 1. However, when the model becomes too complex to permit a global analysis, it becomes 
necessary to introduce further constraints to restrict the parametrisation to be considered. Such 
constraints should be as objective as possible. 

One viable, objective constraint is to limit all the signalling delays around a closed loop so that 
they sum to no more than the period t fr of free-running oscillations in the target data set. 

For 1-loop Neurospora, this gives the single delay bound 

Ti + t 2 «S t fr , (11) 

where t fr = 22h. 

For 2-loop Neurospora, the constraint results in 2 delay bounds 

t 1+ t 3 ^t fr , ^ 

where again tfr = 22h. 

For 2-loop Arabidopsis, there are 3 delay bounds: 

Ti + T 2 + r 3 s: t fr , 

T 5 + T 6 J? T FR , (13) 
T2 + T 3 + T 4 + T 6 T FR . 

In this case t fr = 25h. 
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For 3-loop Arabidopsis, there are 4 delay bounds: those for the 2-loop model above together 
with an extra bound introduced by the addition of the LHY-PRR loop: 



T7 + T 8 T F R. (14) 

For this circuit, tfr = 24h. 

Two further constraints were introduced for reasons of computational tractability. Firstly, each 
delay Tj was restricted to integer multiples of a minimum delay resolution tr, itself a multiple k T ts 
of the data sampling interval t$. Secondly, each threshold Tj was bounded within a subinterval 
\Tmin ,Tmax\ of [0, 1], and also restricted to integer multiples of a minimum threshold resolution 
Tr. 

4.3.1 Optimisation to synthetic data 

For all models, Tmin and Tmax were set to the values 0.2 and 0.8 respectively. For the Neurospora 
models, k T was set to 1 and Tr to 0.025; for the Arabidopsis models, k T was initially set to 2 
and Tr to 0.05. Scores were then recalculated with k T = 1 and Tr = 0.025, within intervals 
|Yi — 3rji,Ti + 3tr] and [Tj — 5Tr,T, + 5Tr] centred around the parameter combinations giving 
the best scores. For Arabidopsis optimisations, light parameters controlling the impulse inputs {L\ 
and Z/3 in the 2-loop circuit; Li,L^ and L4 in the 3-loop circuit) were fixed at the values shown 
in Table [TJ These were determined from discrete approximations to the corresponding continuous 
curves in the DE models. 

Each parameter set was then checked for a functioning circadian clock. Firstly, the solution 
to the free-running model was generated under the appropriate continuous light conditions, using 
the discretiscd data as initial conditions. If a limit cycle was obtained with period within 20% of 
the DE model, this was fed into the model under simulated 12:12 LD cycles. Periodic, entrained 
oscillations were taken to indicate a viable clock circuit. 

For the Neurospora and 2-loop Arabidopsis models, optimisations included all possible LCs. The 
3-loop Arabidopsis optimisations were restricted to the subset of LCs defined by G = (lOOHOllxyz), 
x,y, z e {0, 1}; these are the configurations that result from fixing gates g\, g-j to their optimised 
values in the 2-loop circuit. 

The photoperiod simulations shown in Fig. [7] were obtained by locally rcoptimising the logic 
circuits yielding viable clocks to simulations of the DE models under 12:12 LD cycles, and then 
calculating the maximum symmetric photoperiod interval (12 — Pmax, 12 + Pmax) over which 
both model formulations generated stable, entrained solutions. The cost was minimised with a 



simulated annealing algorithm 18|, [3J, [75j , using a cost function for which the data and predicted 
time courses were taken to be single cycles of the entrained solution in the DE and logic models 
respectively 

4.3.2 Optimisation to experimental LUC data 

In fitting the 3-loop Arabidopsis model to the LUC data shown in Fig. genes were matched to 
model variables in the following manner: (i) CCAl was equated to LHY on the basis that the LHY 



variable in the equivalent DE model groups together the effect of the two genes [36|; (ii) TOC1 was 
matched to TOC1; (iii) GI was matched to Y owing to ex per imental results showing that this gene 
can account for some of the action of Y in the DE model [36| ; (iv) PRR9 was matched to PRR, as 
this variable combines the PRR7 and 9 genes in the DE formulation 
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Since the exact biological correlate of variable X is currently unknown, it was not used for 
costing the fit. Consequently, gi and T2 were both fixed at (note that fixing gi reduces the 
number of possible LCs from 2048 to 1024). This preserves the dynamics of all components in the 
model excluding X, together with the delay bounds used for constraining the parameter space. It 
also means that TOC1 and the dummy variable X have identical time series. Thus, in practice, the 
discretised TOC1 expression time series was used as a proxy for X data when calculating the cost 
at the LHY vertex. Also, as the LUC data was measured in LL, the parameters p\ — > controlling 
the light inputs were all set to 24 (cf. eqn. (|2j)). In LL conditions, the absence of dawn and dusk 
means that the choice of the delay parameters Tg — > t\i is arbitrary. We therefore set the values 
of these delays to 0. Tmin and Tmax were fixed at 0.2 and 0.8 respectively. k T was initially set 
to 2 and Tr to 0.2. Scores were then recalculated with k T = 1 and Tr = 0.05, within intervals 
[r,; — 2tr, Ti + 2tr] and [Tj — 2Tr, Tj + 2Tr] centred around the best scoring parameter sets. The 
optimal parameter set for each LC was assessed to determine whether it yielded a viable clock by 
first generating the solution to the model using the discretised LUC time series as initial conditions, 
and then checking that this gave a limit cycle with free-running period within 20% of 24h. 



4.4 Software 

The numerical routines for parameter optimisation and model simulation were initially developed 
in MATLAB (Mathworks, Cambridge) and C. The scoring algorithms used for global parameter 
sweeps were subsequently converted into Java and run on a task farm computer consisting of 
118 Intel Harpcrtown quad-core processors. All software used is available on request from the 
corresponding author. 
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Figure legends 



Figure 1. Circuit diagrams for the clock models. Genes are boxed and arrows denote 
regulatory interactions. Diamonds represent light inputs. A. The singledoop Neurospora model 
[191 ]. FRQ protein represses production of frq transcript. Lig ht acts on the network by uprcgulating 
frq transcription. B. The two-loop Neurospora model [18[. Two isoforms of FRQ are produced 
which both repress frq transcription. Light uprcgulates frq as in A. C. The two-loop Arabidopsis 
model 35(. TOC1 activates its repressor LHY (combining LHY and CCA1) indirectly through 



a hypothetical gene X, forming the central negative feedback loop of the circuit. LHY is directly 
upregulated by light while light indirectly activates TOC1 via a second hypothetical gene Y, posited 
to have two distinct light inputs. Y activates TOC1 transcription and TOC1 represses Y, forming 
a second, interlocked feedback loop. D. The three-loop Arabidopsis model 36(. The additional 
PRR gene (combining PRR7 and PRR9) is light-activated and represses LHY transcription. LHY 
uprcgulates PRR, creating a third feedback loop. 



Figure 2. The abstract topologies for the logic representations of the clock circuits 
shown in Fig. QJ Genes are boxed and arrows denote regulatory interactions. A. The single-loop 
Neurospora model. B. The two-loop Neurospora model. C. The two-loop Arabidopsis model. D. 
The three-loop Arabidopsis model. Numerals I index logic gates gi e {0, 1} that can be varied to 
generate different regulatory structures. Numerals at the end of an arrow index the single- input 
gate defining that interaction. Numerals within boxes index logic gates governing double-input 
interactions. Diamonds represent light inputs, with the corresponding fixed gates in ovals indicating 
how these affect the target species (e.g. in C, and L3 are combined with an OR gate after which 
the resulting bitstring is combined with the output of Y through an AND gate - see the Methods 
sections for full details). TjS represent the circuit delays and T^s the discretisation thresholds used 
to fit continuous data. 



Figure 3. The number of parameters required for each clock configuration as DE and 
logic models. 

Figure 4. The results of exploring the logic configurations belonging to the abstract 
topologies of the 1-loop (A) and 2-loop (B) Neurospora models. Cost scores are shown for 
the optimal fit of each LC to synthetic data. The LCs are indexed by their decimal representations 
for brevity (see Methods for details). Here, a score of indicates the best fit and a score of 1 the 
worst fit. Triangles indicate LCs for which the Boolean model yields a viable clock. LCs mirroring 
the activation and inhibition pattern of the corresponding DE models in Figs. [TJA and B are plotted 
in red. In A, one such LC mirrors the corresponding DE model, G = (01), and this emerges as 
the optimal configuration yielding a viable clock. In B, only LCs yielding scores less than 0.75 are 
shown. There are two that mirror the equivalent DE model. One of these, G = (00111), is identified 
as the optimal configuration giving a viable clock (leftmost red triangle). In this LC, either of the 
FRQ isoforms can independently inhibit transcription (the corresponding two-input gate is of the 
AND type). 

Figure 5. The results of exploring the logic configurations belonging to the abstract 
topologies of the 2-loop (A) and 3-loop (B) Arabidopsis models. Cost scores are shown 
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for the optimal fit of each LC to synthetic data. As in Fig. |4j each LC is indexed by its decimal 
expansion. Scores of and 1 indicate the best and worst fits respectively. Triangles denote LCs for 
which the Boolean model yields a viable clock. LCs mirroring the activation and inhibition pattern 
of the corresponding DE models in Figs. [T]C and D are plotted in red. In A, only LCs yielding scores 
less than 0.75 are shown. Of these, there are 3 LCs consistent with the activation and inhibition 
pattern of the equivalent DE model from a possible total of 4. From this subset, G = (10011011) 
emerges as the optimal configuration yielding a viable clock. For this circuit, both LHY and TOC1 
can independently inhibit Y while TOC1 is repressed unless LHY is repressed while Y is activated 
(the corresponding gates are both of the AND type). In B, only the 8 LCs obtained by varying 
the gates of the PRR-LHY loop were considered: all other gates were fixed to those of the optimal 
2-loop configuration. Of these 8 possible circuits, two LCs were consistent with the equivalent DE 
model, from which G = (10011011011) is identified as the optimal configuration giving a viable 
clock. This LC corresponds to a clock network in which LHY is repressed unless PRR is inactive 
and X is active (the corresponding gate is of the AND type) . 

Figure 6. Time series for the differential equation and Boolean versions of the clock 
models in 12:12 LD cycles. Two 24h cycles are plotted for each model. A, B: 1-loop Neurospora; 
C, D: 2-loop Neurospora; E, F: 2-loop Arabidopsis; G, H: 3-loop Arabidopsis. Differential equation 
time series (left panels) have been normalised to lie between and 1 in order to facilitate comparison 
with the Boolean simulations (right panels). Different components within a model are slightly offset 
from one another so they can be distinguished more easily. The time step used for solving the 
Boolean models was 0.5h, equal to the data sampling interval. 



Figure 7. Comparing the photoperiodic behaviour of the Boolean and DE versions of 
each model. For Boolean models, the phase of each species is taken as the time within the LD 
cycle of the ON to OFF transition (downward triangles). Analogously, the DE model phases are 
defined as the times at which species decrease below the thresholds yielding the optimal fit of the 
corresponding Boolean circuit to data (upward triangles). Shaded areas of plots, darkness; open 
areas, light. A. 1-loop Neurospora. The phase-photoperiod profiles are coincident, indicating that 
the Boolean model exactly reproduces the photoperiodic behaviour of its DE counterpart: FRQ 
transcript and protein are both locked to dusk across the photopcriod range. B. 2-loop Neurospora. 
The phase plots are almost exactly equal, except for shorter photoperiods where they differ by the 
data sampling interval. As for A, all components arc locked to dusk. C. 2-loop Arabidopsis. The 
Boolean and DE models exhibit very similar patterns of dawn- and dusk-locking across genes. The 
two Y phase-photoperiod profiles reflect the double peak observed in this component (Fig. 03) 
which gives rise to a (dawn- locked) light-induced peak and a (dusk-locked) circadian peak 57J. D. 
3-loop Arabidopsis. The phase plots are similar, with all components predominately dawn-locked. 
As for 2-loop Arabidopsis, the two Y profiles reflect the double peak observed for this gene (Fig. 
IPO- 



Figure 8. Identifying the logic configurations of the 3-loop Arabidopsis model giving 
the best fits to experimental data. Plotting conventions are as described in Figs. S]and[5] A 
shows the top ranking LCs . The black arrow indicates the optimal configuration yielding a viable 
clock, Gopt = (10101011011). For this circuit, LHY and TOC1 repress each other, in agreement 
with recent biochemical evidence [sjj . The red arrow denotes the second highest ranking viable 
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LC, Gde = (10011011011). This LC matches the regulatory structure of the DE model, and was 
previously identified as the configuration yielding the best fit to synthetic data (see Fig. [SJ3). B 
plots the top ranking LCs obtained under the constraint that LHY represses TOC1 and TOC1 
activates LHY (i.e. g\ = 1 and 52 = ff3 = 0; see Fig. 03). Under this assumption, Gde emerges as 
the optimal clock circuit. The circuit diagrams for Gopt and Gde can be seen in Supp. Fig. S4. 

Figure 9. Simulations generated by the optimal fits of the 3-loop Arabidopsis model 
to experimental data. A. Experimental expression profiles for the genes CCAl, TOC1, GI and 
PRR9 in free-running conditions (LL). Expression levels were determined using LUC reporter gene 
imaging constructs and have been normalised to lie between and 1. B. The equivalent Boolean 
time series generated by the logic configuration Gopt yielding the best fit to data. C. Boolean 
expression profiles for the highest ranked configuration Gde incorporating the central LHY-CCA1 
negative feedback loop of the DE model. In all plots, different components arc slightly offset from 
one another so they can be distinguished more easily. The time step used for solving the logic 
model was 1.5h, equal to the data sampling interval. 
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Tables 



Tabic 1: Optimal parameter sets - synthetic data. 





One-loop 


Two-loop 


Two-loop 


Three-loop 




Neurospora 


Neurospora 


Arabidopsis 


Arabidopsis 


G 


01 


00111 


10011011 


10011011011 


Ti (h) 


5 (5) 


5 (5.5) 


1.5 (3) 


(3) 


T 2 (h) 


6.5 (6.5) 


1.5 (2) 


5.5 (6) 


5.5 (6.5) 


t 3 (h) 


7.5 (9) 


6 (6.5) 


6.5 (7.5) 


7(8) 


r 4 (h) 




10 (9) 


(0.5) 


0(1) 


r 5 (h) 


- 


9 (9) 


7.5 (6) 


8(5) 


T6 (h) 


- 


- 


4(4) 


5 (3.5) 


r 7 (h) 


- 


- 


0(1) 


4.5 (6) 


t 8 (h) 


- 


- 


2.5 (0.5) 


6(5) 


(h) 


- 


- 


1(0) 


1(0) 


no (h) 


- 


- 


- 


1(3) 


m (h) 


- 


- 


- 


0.5 (0) 


ri2 (h) 


- 


- 


- 


3(4) 


Ti 


0.35 (0.40) 


0.425 (0.425) 


0.250 (0.450) 


0.1250 (0.4350) 


T 2 


0.40 (0.70) 


0.525 (0.650) 


0.375 (0.775) 


0.3000 (0.6850) 


T 3 




0.250 (0.350) 


0.575 (0.750) 


0.6250 (0.7325) 


T 4 






0.150 (0.325) 


0.2250 (0.1295) 


T 5 








0.8250 (0.9500) 


Pi (h) 


P 


P 


2 


2 


f>2 (h) 




P 


P 


P 


f>3 (h) 






0.5 


0.5 


Pa (h) 








3 



The logic configurations, G, delays, Tj, and discretisation thresholds, Ti (0 < Tj < 1), yielding the 
best fit of each logic model to synthetic time series. The values used for photopcriod simulations - 
obtained by fitting directly to DE time series - are shown in brackets. For each model, pk 
indicates the parameter used to simulate light input Lk through equation @. These were fixed at 
the values shown, with P denoting the photoperiod tauSK — tDAWN ■ 
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Table 2: Optimal parameter sets - experimental data. 



G 


10101011011 (Gopt) 


10011011011 (G DE ) 


n (h) 


1.5 


1.5 


T2 (h) 








^3 (h) 


1.5 


10.5 


T 4 (h) 


9 





r 5 (h) 


6 


7.5 


re (h) 


4.5 


4.5 


r 7 (h) 


1.5 


3 


t 8 (h) 


10.5 


6 


T9 (h) 








no (h) 








ni (h) 








712 (h) 








Ti 


0.30 


0.35 


T 2 


0.40 


0.35 


T 3 






T 4 


0.15 


0.10 


r 5 


0.50 


0.45 


Pi (h) 


24 


24 


Vi (h) 


24 


24 


(h) 


24 


24 


(h) 


24 


24 



The logic configurations, G, delays, Tj, and discretisation thresholds, Ti (0 < Ti < 1), yielding the 
top two fits of the 3-loop Arabidopsis logic model to experimental LUC time series recorded in 
LL. Gopt is the highest-scoring LC. Gde is the second highest scoring LC, and is also the top 
ranked configuration under the constraint that LHY represses TOC1 and TOC1 promotes LHY 
production. Note that Gde was previously identified as the LC giving the optimal fit to synthetic 
data generated by the equivalent DE model (see Table [1} . 
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